Big Data in metagenomics: Apache Spark vs MPI
نویسندگان
چکیده
منابع مشابه
Static and Dynamic Big Data Partitioning on Apache Spark
Many of today’s large datasets are organized as a graph. Due to their size it is often infeasible to process these graphs using a single machine. Therefore, many software frameworks and tools have been proposed to process graph on top of distributed infrastructures. This software is often bundled with generic data decomposition strategies that are not optimised for specific algorithms. In this ...
متن کاملA comparison on scalability for batch big data processing on Apache Spark and Apache Flink
*Correspondence: [email protected] 1Department of Computer Science and Artificial Intelligence, CITIC-UGR (Research Center on Information and Communications Technology), University of Granada, Calle Periodista Daniel Saucedo Aranda, 18071 Granada, Spain Full list of author information is available at the end of the article Abstract The large amounts of data have created a need for new fram...
متن کاملAn Information Theoretic Feature Selection Framework for Big Data under Apache Spark
With the advent of extremely high dimensional datasets, dimensionality reduction techniques are becoming mandatory. Among many techniques, feature selection has been growing in interest as an important tool to identify relevant features on huge datasets –both in number of instances and features–. The purpose of this work is to demonstrate that standard feature selection methods can be paralleli...
متن کاملConquering Big Data with Spark
Today, big and small organizations alike collect huge amounts of data, and they do so with one goal in mind: extract "value" through sophisticated exploratory analysis, and use it as the basis to make decisions as varied as personalized treatment and ad targeting. To address this challenge, we have developed Berkeley Data Analytics Stack (BDAS), an open source data analytics stack for big data ...
متن کاملApproximate Stream Analytics in Apache Flink and Apache Spark Streaming
Approximate computing aims for efficient execution of workflows where an approximate output is sufficient instead of the exact output. The idea behind approximate computing is to compute over a representative sample instead of the entire input dataset. Thus, approximate computing — based on the chosen sample size — can make a systematic trade-off between the output accuracy and computation effi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: PLOS ONE
سال: 2020
ISSN: 1932-6203
DOI: 10.1371/journal.pone.0239741